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SUMMARY 


The  ARPA  Computer  Network  will  provide  communication 
paths  between  a  set  of  computer  centers  distributed  across  the 
United  States.  The  purpose  of  the  network  is  to  inexpensively 
and  rapidly  make  available  to  all  of  the  network's  users,  the 
special  capabilities  of  each  of  the  computer  centers.  The 
ARPA  Contract  with  the  Network  Analysis  Corporation  involves 
the  analysis  and  design  of  this  network  and  a  study  of  the 
properties  of  networks  of  this  type. 

The  objectives  of  the  project  are: 

(1)  To  develop  computer  programs  which  can  determine 

i 

economical  data  line  locations  and  line  capacities. 

(2)  To  operate  the  programs  in  order  to  determine  the 
appropriate  data  lines  to  be  leased  from  AT&T,  and  the  cost- 
throughput-time  delay  characteristics  for  store-and-forward 
networks  such  as  the  ARPA  Network. 

(3)  To  study  the  properties  of  large  store-and-forward 
computer  networks  and  to  develop  a  specific  example  exhibiting 
the  cost- throughput  characteristics  of  a  large  network. 


(4)  To  study  the  effect  on  network  performance  of 
alternate  routing  procedures  and  the  amount  of  storage  at 
each  node* 

Each  network  to  be  designed  must  satisfy  a  number  of 
constraints.  It  must  be  reliable,  it  must  be  able  to  accommo¬ 
date  variations  in  traffic  flow  without  significant  degradation 
in  performance,  and  it  must  be  capable  of  efficient  expansion 
when  new  nodes  and  links  are  added  at  a  later  date.  Each  de¬ 
sign  must  have  an  average  response  time  for  short  messages  no 
greater  than  0.2  seconds.  The  goal  of  the  optimization  is  to 
satisfy  all  of  the  constraints  with  the  least  possible  cost  per 
bit  of  transmitted  information. 

Objectives  (1)  -  (2)  of  the  project  have  been  completed 
and  objective  (3)  will  be  completed  shortly.  An  operational 
computer  program  has  been  developed.  This  program  is  capable  of: 
(1)  analyzing  proposed  network  designs,  and  (2)  finding 
economical  combinations  of  lines  which  lead  to  highly  efficient 
low  cost  network  designs.  The  general  design  philosophy  followed 
as  well  as  the  specific  elements  considered  in  the  implementation 
of  the  program  are  described  in  the  report.  The  computer  program 
was  used  to  determine  the  most  economical  lines  which  can  be 
leased  to  satisfy  the  communication  requirements  of  the  ARPA  Net- 


work.  It  has  also  been  used  to  study  the  relationships  between 
traffic  level,  link  capacities,  and  cost  as  a  function  of  the 
number  of  nodes  in  the  network.  Extensive  studies  have  been 
made  for  twelve,  sixteen,  eighteen,  and  twenty  node  networks 
where  each  node  was  a  potential  site  for  the  ARPA  Network.  A 
number  of  the  results  of  these  studies  are  summarized  in  this 
report. 


Highly  efficient  algorithms  have  been  developed 
and  programmed  for  the  study  of  large  computer  networks.  Methods 
for  optimizing  the  design  of  centralized  networks  have  been  dis¬ 
covered.  These  methods,  which  are  described  here,  are  presently 
being  used  to  design  large  decentralized  hierarchal  networks. 

The  problem  of  routing  is  under  investigation  and 
preliminary  results  will  be  reported  shortly. 
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INTRODUCTION 

The  ARPA  Network  will  provide  store-and- forward  communi¬ 
cation  paths  between  a  set  of  computer  centers  distributed  across 
the  continental  United  States.  The  message  handling  tasks  at 
each  node  in  the  network  is  performed  by  a  special  purpose  Inter¬ 
face  Message  Processor  (IMP)  located  at  each  computer  center. 

The  centers  will  be  interconnected  through  the  IMPS  by  fully 
duplex  telephone  lines,  of  typically  50  kilobit/sec  capacity. 

When  a  message  is  ready  for  transmission,  it  will  be 
broken  up  into  a  set  of  packets,  each  with  appropriate  header  in¬ 
formation.  Each  packet  will  independently  make  its  way  through 
the  network  to  its  destination.  When  a  packet  is  transmitted 
between  any  pair  of  nodes,  the  transmitting  IMP  must  receive  a 
positive  acknowledgment  from  the  receiving  IMP  within  a  given 
interval  of  time.  If  this  acknowledgment  is  not  received,  the 
packet  will  be  retransmitted,  either  over  the  same  or  a  different 
channel  depending  on  the  network  routing  doctrine  being  employed. 

One  of  the  design  goals  of  the  system  is  to  achieve  a 
response  time  of  less  than  0.2  seconds  for  short  messages.  A 
measure  of  the  efficiency  with  which  this  criterion  is  met  is  the 
cost  per  bit  of  information  transmitted  through  the  network  when 
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the  total  network  traffic  is  at  the  level  which  yields  0.2  second 
average  time  delay.  The  goal  of  the  network  design  is  to  achieve 
the  required  response  time  with  the  least  possible  cost  per  bit. 

The  final  network  design  is  subject  to  a  number  of  additional  con¬ 
straints.  It  must  be  reliable,,  it  must  have  reasonably  flexible 
capacity  in  order  to  accommodate  variations  in  traffic  flow  without 
significant  degradation  in  performance,  and  it  must  be  neatly  ex¬ 
pandable  so  that  additional  nodes  and  links  can  be  added  at  later 
dates.  The  sequence  and  allowable  variations  with  which  the  nodes 
are  added  to  the  network  must  also  be  taken  into  account.  At  any 
stage  in  the  evolution  of  the  network,  there  must  be  at  least  one 
communication  path  between  any  pair  of  nodes  that  have  already  been 
activated.  In  order  to  achieve  a  reasonable  level  of  reliability, 
the  network  must  be  designed  sc  that  at  least  two  nodes  and/or 
links  must  fail  before  the  network  becomes  disconnected. 

To  plan  the  orderly  growth  of  the  network,  it  is 
necessary  to  predict  the  behavior  of  proposed  network  designs.  To 
do  this,  traffic  flows  must  be  projected  and  network  routing  pro¬ 
cedures  specified.  The  time  delay  analysis  problem  has  been 

[1,  2] 

studied  by  Kleinrock  who  considered  several  mathematical 

models  of  the  ARPA  Network.  Kleinrock' s  comparison  of  his  analysis 
with  computer  simulations  indicates  that  network  behavior  can  be 


qualitatively  predicted  with  reasonable  confidence.  However, 
additional  study  in  this  area  is  needed  before  all  the  significant 
parameters  which  describe  the  system  can  be  incorporated  into  the 
model.  For  the  present,  it  appears  that  a  combination  of  analysis 
and  simulation  can  best  be  applied  to  determine  a  specific  net¬ 
work  ' s  behavior . 

Even  if  a  proposed  network  can  be  accurately  analyzed, 
the  most  economical  networks  which  satisfy  all  of  the  constraints 
are  not  easily  found.  This  is  because  of  the  enormous  number  of 
combinations  of  links  that  can  be  used  to  connect  a  relatively 
small  number  of  nodes.  It  is  not  possible  to  examine  even  a  small 
fraction  of  the  possible  network  topologies  that  might  lead  to 
economical  designs.  In  fact,  the  direct  enumeration  of  all  such 
configurations  for  a  twenty  node  network  is  beyond  the  capabilitie 
of  the  most  powerful  present  day  computer. 
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TOPOLOGICAL-  OPTIMIZATION 

As  part  of  NAC's  study  of  computer  network  design,  a 
computer  program  was  developed  to  find  low  cost  topologies  which 
satisfy  the  constraints  on  network  time  delay,  reliability,  con¬ 
gestion,  and  other  performance  parameters.  This  program  is 
structured  to  allow  the  network  designer  to  rapidly  investigate 
the  tradeoffs  between  average  time  delay  per  message,  network 
cost,  and  other  factors  of  interest. 

The  inputs  to  the  program  are: 

1.  Existing  network  configuration  (i.e.,  lines 
and  nodes  already  installed  and  ordered) 

2.  Estimated  traffic  between  nodes 

3.  Maximum  average  delay  desired  for  short 
messages 

In  addition,  the  user  may  specify  to  the  program  a  maximum  cost 
that  no  network  design  will  be  allowed  to  exceed. 

The  output  of  the  program  is  a  sequence  of  low  cost  net¬ 
works.  Each  network  is  identified  by  the  following  information: 

1.  Network  topology 

2.  Cost  per  month 

3.  Maximum  throughput 

4.  Estimated  average  traffic 


5.  Message  cost  per  megabit  at  maximum  throughput 

6.  Average  message  delay  for  short  messages 

Each  acceptable  network  design  also  conforms  to  the  standard 
that  at  least  two  nodes  and/or  links  must  fail  before  all 


communication  paths  between  any  pair  of  nodes  are  disrupted. 


APPROACH 


The  general  design  proolem  as  stated  above  is  similar  to 

other  network  resign  problems  for  which  computational ly  practical 

solutions  have  recently  been  obtained.  These  problems  include  the 

[3] 

minir  cost  design  of  survivable  networks,  the  minimum  cost 

[4] 

selection  and  interconnection  of  Telpaks  in  telephone  networks, 

[5] 

the  design  of  offshore  ncitural  gas  pipeline  networks,  and  the 

[tj 

classical  Traveling  Salesman  problem.  These  problems  have  long 

resisted  exact  solution?  however,  recent  work  on  approximate 
methods  has  been  extremely  successful  and  has  led  to  efficient 
methods  of  finding  low  cost  solutions  in  practical  computation 
times . 


The  Desicrn  Phi  lose 


By  a  “feasible"  solution,  we  mean  one  which  satisfies  all 
of  the  network  constraints.  By  an  "optimal"  network,  w?  mean  the 
feasible  network  with  the  least  possible  cost.  Our  goal  is  to 
develop  a  method  that  can  handle  realistically  large  problems  in 
a  reasonable  computation  time  and  which  can  find  feasible  solutions 


with  costs  close  to  ootimal. 


The  method  to  be  used  has  two  main  parts  called  the 
starting  routine  and  the  optimizing  routine .  The  starting 
routine  generates  a  feasible  solution.  The  optimizing  routine 
then  examines  networks  derived  from  this  starting  network  by 
means  of  local  transformations  applied  to  the  network  topology. 
When  a  feasible  network  with  lower  cost  is  found,  it  is  adopted 
as  a  new  starting  network  and  the  process  is  continued.  In  this 
wav,  a  feasible  network  is  eventually  reached  whose  cost  can  not 
be  reduced  by  applying  additional  local  transformations  of  the 
type  being  considered.  Such  a  network  is  called  a  locally 
optimum  network. 

Once  a  locally  optimum  network  is  found,  the  entire  pro 
cedure  is  repeated  by  a_ain  using  the  starting. routine .  The 
starting  routine  may  incorporate  suggestions  made  by  a  human  de¬ 
signer.  For  example,  tne  present  tentative  configurations  for  th 
ARPA  Network  have  been  used.  Alternatively,  if  desired,  the 
starting  routine  may  generate  feasible  networks  without  such  ad¬ 
vice.  At  the  present  time,  our  starting  routine  is  capable  of 
generating  about  100,000  low  cost  networks.' 

.  By  finding  local  optima  from  different  starting  network 
a  variety  of  solutions  can  be  generated.  Figure  1  shows  a  dia¬ 
grammatic  representation  of  the  process.  The  space  of  feasible 
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solution:.;  is  represented  by  the  area  enclosed  by  the  outer 
border  or  the  figure;  starting  solutions  are  represented  by 
light  circles  and  local  optima  by  dark  circles.  The  practicality 
of  the  approach  is  based  on  the  assumption  that  with  a  high  pro¬ 
bability  some  of  the  local  optima  found  are  close  in  cost  to  the 
global  optimum.  Naturally,  this  assumption  is  sensitive  to  the 
particular  transformation  used  in  the  optimizing  routine.  A 
block  diagram  of  the  optimization  procedure  is  shown  in  Figure  2. 


bocal  Transformations 


A  local  transformation  on  a  network  is  generated  by 
identifying  a  set  of  links,  removing  these  links,  and  adding  a 
new  set  to  the  network.  The  method  of  selection  of  the  number 
and  location  of  the  links  to  be  removed  and  added  determines  the 
usefulness  of  the  transformation  and  its  applicability  to  the 


problem  in  hand.  For  example,  in  the  problem  of  economically 
designing  offshore  natural  gas  pipeline  networks,  dramatic  cose 
reductions  were  achieved  by  removing  and  adding  one  link  at  a 

On  the  other  hand,  in  a  problem  of  the  minimum  cost  de- 


[5] 


sign  of  survivabie  networks,  the  most  useful  link  exchange  con¬ 
i'  'J 1 
L  -  j 

sis ted  of  removing  and  adding  two  links  at  a  time.  In  general, 

it  is  not  nccossarv  that  the  same  number  of  links  be  added  and 


removed  during  each  application  of  the  transformation. 
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DESIGN  CONSTRAINTS 

The  preceding  section  has  a  given  general  approach  for 
the  design  of  low  cost  feasible  networks.  To  implement  this 
approach,  a  number  of  specific  problems  must  be  considered. 

These  include: 

1.  The  distribution  of  network  traffic. 

2.  Network  Route  Selection. 

3.  Link  capacity  assignment. 

4.  Node  and  Link  Time  Delays. 

Distribution  of  Traffic 

At  the  present  time,  it  is  difficult  to  estimate  the  pre¬ 
cise  magnitude  and  distribution  of  the  Host-to-Kost  traffic. 
However,  one  design  goal  is  that  the  amount  of  flow  that  can  be 
transmitted  between  nodes  should  not  significantly  vary  with  the 
locations  of  sender  and  receiver.  Hence,  two  users  several 
thousand  miles  apart  should  receive  the  same  service  as  two  users 
several  hundred  miles  apart.  A  reasonable  requirement  is  therefore 
that  the  network  be  designed  so  that  it  can  accommodate  equal 
traffic  between  all  pairs  of  nodes.  However,  it  is  known  that 
certain  nodes  have  larger  traffic  requirements  to  and  from  the 
University  of  Illinois'  Illiac  IV  than  to  other  nodes.  Con- 
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sequent lv,  information  of  this  type  is  incorporated  into  the  model. 

The  magnitude  of  the  network  traffic  is  treated  as  variable. 

A  "base*'  traffic  requirement  of  500  #n  bits  per  second  (n  is  a 
positive  real  number)  between  all  nodes  is  assumed.  An  additional 
500-n  bits  per  second  is  then  added  to  and  from  the  diversity  of 
Illinois  (node  No.  9)  and  nodes  4,  5,  12,  18,  19,  and  20.  The 
base  traffic  is  used  to  determine  the  flows  in  each  link  and  the 
link  capacities  as  discussed  in  the  following  sections,  n  is  then 
increased  until  the  average  time  delay  exceeds  .2  seconds.  The 
average  number  of  bits  per  second  per  node  at  average  delay  equal 
.2  seconds  is  taken  as  a  measure  of  performance  and  the  corresponding 
cost  per  bit  is  taken  as  a  measure  of  efficiency  of  the  network. 

Route  Selection 

In  order  to  avoid  the  prohibitively  long  computation  times 
required  to  analyze  dynamic  routing  strategies,  a  fixed  routing 
procedure  is  used.  This  procedure  is  similar  to  the  one  which  will 
be  used  in  the  operating  network  but  it  has  the  advantage  that  it 
can  be  readily  incorporated  into  analysis  procedures  which  do  not 
depend  on  simulation. 

The  routing  procedure  is  determined  by  the  assumption  that 
for  each  message  a  path  which  contains  the  fewest  number  of  inter- 
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me  .into  nodes  from  oricrin  to  destination  is  most  desirable . 

Given  a  proposed  network  topology  and  traffic  matrix,  routes  are 
determined  as  follows:  For  each  i  (i  =  1,  2,  ...  ,  N  =  20): 

1.  With  node  i  as  an  initial  node,  use  a  labelling 
procedure  [7]  to  generate  all  paths  containing  the  fewest  number 
of  intermediate  nodes,  to  all  nodes  which  have  non-zero  traffic 
from  node  i.  Such  paths  are  called  feasible  paths. 

2.  If  node  i  has  non-zero  traffic  to  node  j  (j  =  1,  2, 
.  ..  /  N,  j  /  i)  and  the  feasible  paths  from  i  to  j  contain  more 
than  seven  nodes,  the  topology  is  considered  infeasible. 

3.  Nodes  are  grouped  as  follows: 

(a)  All  nodes  connected  to  node  i. 

(b)  All  nodes  connected  to  node  i  by  a  feasible 
path  with  one  intermediate  node. 

(c)  All  nodes  connected  to  node  i  by  a  feasible 
path  with  two  intermediate  nodes. 

(d)  - - 


*  A  node  j  /  s,t  is  called  an  intermediate  node  with  respect  to  a 
message  with  origin  s  and  destination  t  if  the  path  from 
s  to  t  over  which  the  message  is  transmitted  contains  node  j. 
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(e) - 

(f)  All  nodes  connected  to  node  i  by  a  feasible 
path  with  five  intermediate  nodes. 

Traffic  is  first  routed  from  node  i  to  any  node  j  which  is 

directly  connected  to  i  over  link  (i,j).  Consequently,  after  this 

stage,  some  flows  have  been  assigned  to  the  network.  Each  node  in 

group  (b)  is  then  considered.  For  any  node  j  in  this  group,  all 

feasible  paths  from  i  to  j  are  examined,  and  the  maximum  flow  thus 

far  assigned  in  any  link  in  each  such  path  is  found.  All  paths 

with  the  smallest  maximum  flow  are  then  considered.  The  path  whose 

total  length  is  minimum  is'  then  selected  and  all  traffic  originating 

★ 

at  i  and  destined  for  j  is  routed  over  this  path.  All  nodes  in 
group  (b)  are  treated  in  this  manner.  The  same  procedure  is  then 
applied  to  all  nodes  in  9 roup  (c) ,  (d) ,  (e)  and  (f)  in  that  order. 

Capacity  Assignment 

Link  capacities  could  be  assigned  prior  to  routing.  Then 
after  route  selection,  if  the  flow  in  any  link  exceeds  its  assigned 
capacity,  the  network  would  be  considered  infeasible.  On  the  other 
hand,  link  capacities  may  be  assigned  after  all  traffic  is  routed; 

*  It  is  also  possible  to  divide  the  traffic  from  i  to  j  and  send 
it  over  more  than  one  feasible  path,  but  for  uniform  traffic 


this  is  not  an  important  factor. 
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wo  adopt  this  approach.  The  capacity  of  each  link  is  chosen  to 
bo  the  least  expensive  option  available  from  AT&T  which  satisfies 
the  flow  requirement.  The  line  options  which  are  presently  being 
considered  are:  50,000  bits/sec  (bps),  108,000  bps,  230,400  bps, 
and  460,000  bps.  Monthly  link  costs  are  the  sum  of  a  fixed 
terminal  charge  and  a  linear  cost  per  mile.  Thus,  to  satisfy  a 
requirement  of  85,000  bps,  depending  on  the  length  of  the  link 
it  is  sometimes  cheaper  to  use  two  50,000  bps  parallel  links  and 
sometimes  cheaper  to  use  a  single  108,000  bps  link. 

The  following  line  options  and  costs  have  been  in¬ 
vestigated: 

Tyoe  Speed  Cost  Per  Month 


Full  Group 

(303 

data 

set) 

50  KB 

$ 

850 

+ 

$ 

4.20/mile 

Full  Group 

(304 

data 

.  ★ 

set) 

108  KB 

$ 

2400 

+ 

$ 

4.20/mile 

Teipak  C 

230.4  KB 

$ 

1300 

+ 

$ 

21.00/mile 

Telpak  D 

460  KB 

$ 

1300 

+ 

$ 

60.00/mile 

Link  and  Node  Delays 

Response  time  T  is  defined  as  the  average  time  a  message 
takes  to  make  its  way  through  the  network  from  its  origin  to  its 
destination.  Short  messages  are  considered  to  correspond  to  a  single 


* 


hot  a  standard  AT&T  offering. 
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packer  which  may  be  as  long  as  1008  bits  or  as  short  as  rev;  bits, 
plus  the  header.  If  T,  is  the  mean  delay  time  for  a  packet  passing 


through  the  i-th  link,  then  T  =  _1 


y  T.  ,  where  r  is 
i  i 


the  total  IMP  -to-  IMP  traffic  rate,  y  is  the  average  traffic  rate 

i 

in  the  ith  link,  and  M  is  the  total  number  of  links.  T.  can  be 

i 

approximated  with  the  Pollaczak-Khinchin  formula  as: 


i  r 


T  = 

i  /*C 


y.  (1  +  a 
1  +  1 


(/"ci  -  h  > 


where  1  /jx.  is  the  average  packet  length  (in  bits),  C,  is  the 
capacity  of  the  ith  link  (in  bits/second) ,  a  is  the  coefficient 
of  variance  for  the  packet  length. 


These  parameters  are  evaluated  as  follows: 

(1)  r  is  the  sum  of  all  elements  in  the  traffic  matrix 
after  each  element  has  been  adjusted  to  include  headers,  parity 
check  and  requests  for  next  message  (RFNM) . 


(2)  y  is  determined  by  the  routing  strategv. 
i 


(3)  In  calculating  1/yu.  ,  we  consider  three  kinds  of 
packets:  (a)  packets  generated  by  short  messages  and  ail  other- 

packets  (except  RFNM's)  with  length  less  than  1008  bits;  (b)  ful 
length  packets  of  1008  bits  belonging  to  long  messages;  (c)  PPM 
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It  is  assumed  that  the  packets  of  part  (a)  arc  uniformly 
distributed  with  mean  length  equal  to  560  bits.  The  packet  length 
for  part  (b)  is  a  constant  equal  to  1008  bits.  The  average  packet 
length  is  then  calculated  by  first  estimating  the  average  number 
of  packets  with  1008  bits.  It  is  assumed  that  each  long  message 
consists  of  an  average  of  4  prckets.  In  many  of  our  computations, 
we  assume  that  80%  of  the  messages  are  short.  The  number  of  RFNM 
packets  can  then  be  estimated,.  Finally,  since  the  average  length 
of  each  type  of  packet  is  known  and  the  number  of  each  type  of 
packet  has  been  estimated,  the  average  packet  3ength  can  be  esti¬ 
mated. 

(4)  y  is  adjusted  to  include  the  increased  traffic  due 

i 

to  acknowledgments.  C  is  then  selected  as  already  described. 

i 

(5)  The  larger  the  value  of  a,  the  larger  the  delay 
time.  For  the  exponential  distribution  a  =  1:  for  a  constant, 

a  =  0;  and  for  many  distributions  0  5*  a  S  1.  Since  it  is  reason¬ 
able  to  assume  that  the  packet  length  distribution  being  considered 
is  very  close  to  the  combination  of  an  uniform  distribution  and  a 
constant,  the  value  of  a  should  be  less  than  one.  To  avoid 
underestimating  T,  a  is  set  equal  to  one  in  all  calculations. 

The  above  analysis  is  based  on  the  assumption  thac  the 
number  cf  available  buffers  is  unlimited.  When  the  traffic  is  lew. 


this  assumption  is  very  accurate. 


13. 


For  high  traffic,  adjustments  to 
account  for  the  limitation  of  buffer  space  are  necessary. 

There  are  two  roles  for  buffers  in  an  IMP?  one  for  re¬ 
assembling  messages  destined  for  that  IMP'S  Host  and  the  other  for 
store-and-forward  traffic.  At  the  present  time,  about  one  half  of 
the  IMP'S  core  is  used  for  the  operating  program.  The  remainder 
contains  about  84  buffers  each  of  which  can  store  a  single  packet. 
Up  to  2/3  of  the  buffers  may  be  used  for  reassembly.  Buffers  not 
used  for  reassembly  are  available  for  store-and-forward  traffic. 
When  no  buffer  is  available  for  reassembly,  any  arriving  packet 
which  requires  reassembly  but  does  not  belong  to  any  message  in  the 
process  of  reassembly  will  be  discarded  and  no  acknowledgment  re¬ 
turned  to  the  transmitting  IMP.  This  packet  must  then  be  re¬ 
transmitted,  and  the  effective  traffic  in  the  link  is  therefore  in¬ 
creased.  In  addition,  each  time  a  packet  is  retransmitted,  its 
delay  time  is  not  only  increased  by  the  extra  waiting  and  transmitt 
time,  but  also  by  the  100  ms  time-out  period.  To  account  for  these 
factors,  an  upDer  bound  on  the  probability  that  no  buffer  is  avail¬ 
able  is  calc,  *ated  for  each  IMP.  The  traffic  between  IMPS  is  then 
-  -'reused  and  extra  delay  time  for  the  retransmitted  packets  is 
calculated.  The  increase  in  delay  time  is  then  averaged  over  all 
the  jackets. 
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hhen  no  buffer  is  available  for  store-and-forward  traffic, 
all  incoming  links  become  inactive.  Effectively,  the  average  usable 
capacities  of  these  links  is  lower  than  their  actual  capacities. 

The  probability  that  no  buffer  is  available  for  store-and-forward 
traffic  is  set  equal  to  the  average  of  an  upper  bound  and  a  lower 
bound;  the  upper  bound  is  calculated  by  assuming  that  the  ratio  of 
flow  to  capacity  of  each  link  into  the  IMP  is  equal  to  the  maximum 
ratio  for  all  links  at  that  node  while  the  lower  bound  is  found  by 
assuming  that  the  ratio  of  flow  to  capacity  for  each  link  is  equal 
to  the  minimum  such  ratio.  Li uk  capacities  are  then  reduced  to  in¬ 
clude  this  effect  and  the  response  time  is  then  recalculated.  An 
example  of  the  effect  of  the  above  assumptions  is  shown  in  Figure  4. 
Figure  4  relates  average  time  delay  and  throughput  per  node  for  the 
network  shown  in  Figure  3.  Two  curves  are  shown.  One  is  obtained 
by  assuming  that  there  are  an  infinite  number  of  buffers  at  each 
node.  The  second  cv  ve  is  obtained  by  using  the  actual  buffer 
limitations  of  the  ARPA  network. 


COMPUTATIONAL  RESULTS 


The  computer  program  described  above  was  employed  to 
design  many  low  cost  networks  under  varying  assumptions .  In 
this  section,  we  summarize  the  most  significant  of  these  re¬ 
sults.  Among  the  parameters  that  were  varied  in  the  designs 
were : 

1.  number  and  identity  of  nodes 

2.  link  capacities 

3.  traffic  levels 

A  maximum  of  twenty  nodes  as  identified  in  the  table  below  were 
ensidered.  Layouts  contained  all  or  a  subset  of  these  nodes. 

The  following  cases  will  be  discussed: 

a.  Twelve  Node  Networks  containing  Nodes  1-11,  14 

b.  Sixteen  Node  Networks  containing  Nodes  1-11,  13-17 

c.  Eighteen  Node  Networks  containing  Nodes  1-11,  13-18,  20 

All  nodes  were  constrained  to  have  no  more  than  5  incident  links  and 
node  1,  no  more  than  4  incident  links. 
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TABLE  1 


X a  v.,x 


Node  Nairic 


Node  Location 


LAT. 

LONG . 

1 

UCLA 

34  04 

118  31 

n 

SRI 

37  22 

122  10 

3 

SB 

34  50 

119  45 

i 

-r 

UTAH 

40  40 

111  50 

D 

RAND 

34  00 

118  35 

6 

3BN 

42  30 

71  20 

7 

SDC 

34  01 

118  33 

S 

MAC 

42  30 

71  12 

9 

ILLINOIS 

40  05 

38  30 

10 

HARVARD 

42  30 

71  15 

— 

CA  RNEGIE-ME  LLON 

40  30 

79  50 

12 

LRL 

37  33 

122  44 

j 

BTL 

40  45 

74  15 

o.**r 

LINCOLN  LABS 

42  25 

71  20 

15 

CASE 

41  30 

81  45 

_  o 

STANFORD 

37  18 

122  10 

17 

MITRE 

39  00 

77  00 

Id 

NCAR  DENVER 

39  30 

105  00 

19 

PRINCETON 

40  30 

74  30 

20 

AFWS  OMAHA 

41  00 

96  00 
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A  major  consideration  is  the  effect  of  the  304  Data 
sot  on  network  cost  and  performance.  This  data  set  w: 11  allow 
a  50  Kilobit  line  to  be  driven  at  108  Kilobits  at  no  additional 
line  cosh  An  additional  terminal  charge  for  this  data  set  is 
required  but  this  charge  is  independent  of  mileage  and  hence  it 
can  be  an  economical  means  of  increasing  the  capacities  of  cross 
country  lines.  Since  the  capacities  of  the  cross  country  lines 
often  limit  the  overall  capability  of  the  network,  it  is  to  be 
expected  that  the  304  Data  set  option  can  enhance  the  network's 
operating  performance.  Networks  were  optimized  with  and  without 
this  option.  Graphs  of  cost  versus  throughput  for.  these  cases 
are  given  below. 

The  effect  of  traffic  levels  and  distribution  upon  per¬ 
formance  was  also  examined.  Traffic  load  is  typically  assumed 
to  be  at  a  uniform  base  level  except  for  100%  additional  traffic 
to  and  from  node  9  and  nodes  4,  5,  12,  18,  19,  and  20.  The  base 
level  of  the  traffic  is  then  a  design  parameter.  For  a  specified 
traffic  matrix,  flows  are  routed  and  capacities  assigned  to  the 
links.  The  elements  of  the  traffic  matrix  are  then  increased, 
thus  increasing  each  link  flo\\  and  the  average  time  delay.  The 
process  is  complete  when  the  network  saturates.  At  each  step  in 
the  iteration  a  uniform  percentage  increase  in  the  traffic  matrix 
takes  place.  The  vast  majority  of  all  design  experiments  follows 


w.:u  example.  However,  in  a  few  eases  special  studies  were  rr.ade 
7.0  determine  the  effect  of  high  concentrations  of  traffic  between 
nodes  9,  IS,  and  19.  These  studies  indicated  that  a  "normally 
loaded"  network  (10  Kilobits/ sec/ Node)  could  accommodate  this 
additional  traffic  without  a  substantial  increase  in  cost  if  103 
Kilobit  lines  can  be  used. 

Finally,  the  effect  of  using  lines  leased  as  of 
September  30,  1963  in  the  designs  was  examined.  As  of  September, 
twelve  lines  connecting  nine  nodes  had  been  ordered  from  AT&T. 

As  part  of  our  study,  it  was  necessary  to  determine  whether  the 
use  of  these  lines  in  the  optimized  networks  would  significantly 
effect  the  operating  characteristics  and  economies  when  compared 
with  the  case  when  any  lines  could  be  used.  Therefore,  two  sets 
optimizations  were  performed;  in  one,  the.  lines  indicated  in 
Table  2  were  constrained  to  appear  in  all  network  designs;  in 
the  other,  there  were  no  such  constraints.  It  was  found  that  net 
works  containing  these  lines  could  be  design ed  which  were  as 
economical  the  best  networks  found  without  these  constraints. 


TABLE  2 


LEASED  LINES  AS  OF  SEPTEMBER  30,  1969 


a, 

2), 

(li 

.3) 

(2, 

•3), 

(2, 

-4) 

(3, 

7) 

(4, 

•7), 

(4; 

r  9) 

(5, 

.6), 

(5, 

,7) 

(6, 

.8) 

(8, 

.9) 

It  is  important  to  note  that  the  costs  given  are  the  cost  of 
leasing  lines  and  do  not  include  the  cost  of  the  IMPS.  Also,  it 
must  be  emphasized  that  the  results  to  be  presented  are  empirical. 
Hence,  we  do  not  claim  that  the  observations  we  present  are  definir 
and  indeed  it  may  be  necessary  to  revise  them.  However,  we  feel  uh 
the  following  results  can  provide  a  useful  step  towards  a  better 
understandincr  of  the  behavior  of  store- and- forward  computer  network 


As  an  example  of  the  studies  performed,  we  will  discuss  a; 
de sign  of  twelve  node  networks.  This  is  the  smallest  operating  : 
work  which  can  be  expected  to  test  adequately  the  design  philcsoi 
of  the  AR?A  Network.  The  twelve  nodes  considered  are  the  firsu  : 
to  be  activated  in  the  ARPA  Installation  schedule  and  are  nodes 
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labeled  1-11  and  14.  One  of  the  first  goals  in  car  study  was  to 
design  networks  which  would  operate  effectively  as  both  twelve 
node  systems  and  then  as  twenty  node  systems  when  later  expanded. 
The  networks  designs  can  be  represented  on  a  scatter  diagram.  The 
coordinate  of  the  horizontal  axis  of  the  diagram  is  cost  in  dollars 
p,_r  year.  The  coordinate  of  the  vertical  axis  is  the  average 

•k 

throughput  per  node  in  bits  per  second  for  a  specified  distribution 
of  traffic.  The  graph  shown  in  Figure  5  is  drawn  for  a  specified 
maximum  average  message  time  delay  of  .19  seconds  for  short 
messages.  Each  point  in  the  graph  corresponds  to  a  network  gene¬ 
rated^  evaluated,  and  optimized  by  the  computer. 


To  interpret  these  results,  consider  any  point  1  corres- 

1 

pending  to  a  network  N^.  Draw  a  horizontal  line  starting  at  ?.  to 

the  right  of  P  and  a  vertical  line  down  from  P  .  Any  point  say  ? 

1  Id 

which  fails  within  the  quadrant  defined  by  the  two  lines  is  said  to 

be  d am 1 rated  b v  P ^ ,  since  in  a  sense,  network  is  "better  than" 

network  X  .  Similarly  N  is  said  to  be  a  dominant  network.  That  is 
2  1 

for  the  same  delay  provides  at  least  as  much  throughput  as  X,  a- 


no  higher  cost.  Ho.vizontal  and  vertical  lines  can  be  drawn 


**■>- 


certain  oointc  P  ,  . . .  ,  P  so  that  ail  other  points  are  dominated 

1  n 


by  at  least  one  of  these, 
the  best  networks. 


P  ,  ...  ,  P  thus  re ore sent,  in  one  sens 
1  n 


*  Throughput  is  the  average  number  of  bits/second  out  of  o. 
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It  should  be  noted  that  a  network  which  is  dominant  for 
one  rime  delay  may  not  be  dominant  for  another.  Many  networks 
with  this  property  have  been  found  in  our  studies.  Furthermore, 
in  some  cases  a  network  may  be  dominated  but  might  still  be 
preferable  to  the  network  which  dominates  it  because  of  other 
factors  such  as  the  order  of  leasing  lines  and  plans  for  future 
growth.  As  an  example,  is  a  dominant  point  and  yet  there 

are  points  which  it  dominates  which  are  very  close  to  it  and 
might  well  be  preferable. 

Figure  6  indicates  the  cost- throughput  characteristics 
of  a  number  of  dominant  networks.  In  addition  to  the  line  cost 
per  month  and  the  average  number  of  kilobits  out  of  each  node, 
we  indicate  whether  the  links  given  in  Table  2  were  constrained 
ro  be  in  the  design.  The  presence  of  304  data  sets  in  the  design, 
and  the  "connectivity"  (i.e.  the  minimum  number  of  nodes  and  or 
links  whose  failure  will  disconnect  the  network)  are  also  indi- 
cated*  Note  that  although  many  designs  do  not  use  304  data  sets, 
this  option  was  available  in  all  designs. 

From  Figure  6  it  is  clear  that  for  rates  below  29 
kilobits/sec/node,  significantly  greater  economies  are  obtainable 
with  connectivity  1  networks  than  for  connectivity  2  networks. 

This  is  because  less  lines  need  be  used,  data  may  be  concentrated 
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through  fswer  long  lines,  and  304  data  sets  used  for  these  lines. 
By  drawing  a  horizontal  line  between  the  connectivity  1  and 
connectivity  2  curves,  the  cost  of  the  additional  reliability  of 
the  connectivity  2  networks  can  be  Measured. 

Figure  7  shows  a  scatter  plot  for  20  node  networks  de¬ 
signed  with  the  108  Kilobit/second  304  Data  Set  Option.  Figure  8 
indicates  cost-throughput  tradeoffs  for  20  node  networks  with  and 
without  this  option*.  Figure  9  presents  this  data  in  a  different 
form  -  as  a  function  of  cost  per  megabit  of  transmitted  in¬ 
formation  versus  the  required  investment  to  achieve  this  cost. 
These  costs  were  obtained  by  assuming  that  the  network  would  be 
in  use  for  24  hours  per  day  and  hence  for  lesser  utilized  systems, 
the  appropriate  adjustments  must  be  made. 

Figure  10  oimmarizes  the  results  of  the  network  optimi¬ 
zations  on  12,  16,  18,  and  20  node  networks  without  108  kilobit 
lines.  One  immediate  observation  is  that  the  node  location  and 
the  traffic  level  are  crucial  factors  in  overall  performance.  In 
this  figure,  we  plot  total  network  cost  against  the  total  Host- to- 
Host  traffic.  Figure  11  shows  total  cost  veTsus  the  average 
throughput  per  node  for  16,  18,  and  20  node  networks  with  and 


without  108  kilobit  lines 


K  bitu/sec/nodo 


o 

o 


20  NODE  NETWORK  SCATTER  DIAGRAM 
ETHimE  7 


091  z  oeoi  oo'ji  om  oozi  ooc  ozl 


i  ts/scc/nodo 


A 


co-tovd  <Nco^rovofM  oo^o 
m  '  ^  ■  n  t?  co  ro  ro  cvj  cn  *h  ~i  vc 

20  NODE  NETWORK  CHARACTERISTICS 
FIOURNJJ 


480  720  9  60  1200  1440  1600  1920  2160  2400 


20  NODE  NETWORK  DATA  COSTS 


FIGURE  9 


840 


1-1"  I  I  1  I  I  I  I  .ill.  .  >  -I  I 

600  648  696  754  802  850  890.  936  984  1032  1080  1128  1176 

NETWORK  CHARACTERISTICS  WITHOUT  .’.08  K  BIT/SEC  LINES 


FIGURE  10 


m  fM  •»<  o  cn  CO  r*  so  in 

rj  rj  fN  fV  fNJ  i-»  »H  rH  rH  rH  *H  *— 4 

THROUGHPUT  PER  NODE  VS.  TOTAL  NETWORK  COST 

FIGURE  11 


37 


Figures  12  and  13  show  the  average  cost  per  node 
versus  the  total  Host-to-Host  traffic  and  the  average  traffic 
per  node,  respectively.  Cost -throughput  characteristics  are 
given  for  systems  with  and  without  108  kilobit  lines. 

Finally,  Figures  14  (a) ,  (b)  and  (c)  show  typical 
computer  designed  networks.  Figure  14  (a)  shows  a  12  node  net¬ 
work  using  a  108  kilobit  line.  This  was  the  only  12  node  net¬ 
work  (except  for  trivial  modifications)  which  was  found  that 
had  connectivity  2  and  still  used  a  108  kilobit  line.  This  net¬ 
work  had  the  lowest  cost  -  throughput  ratio  of  all  networks 

? 

generated.  Figure  14  (b)  shows  a  very  economical  18  node  network 
designed  without  108  kilobit  lines.  In  addition  to  a  low  cost  - 
throughput  ratio,  this  network  has  two  other  lesirable  character¬ 
istics.  First,  it  is  economical  as  a  16  node  network  if  nodes 
18  and  20  are  deleted  (as  well  as  lines  (4,18),  (18,20)  and  (20,15)) 
and  a  50  kilobit/sec  line  is  added  from  node  4  to  15.  Second,  the 
network's  performance  can  be  considerably  enhanced  by  adding 
50  Kilobit  lines  between  nodes  (3,7),  (9,20),  and  (2,17).  These 
additions  result  in  a  throughput  increase  of  7  Kilobits/sec  node 
at  an  additional  monthly  cost  of  only  $15,000.  Figure  14  (c)  shows 
a  20  node,  network,  designed  for  low  total  cost  at  throughput  levels 
projected  for  the  ARPA  Network.  The  108  Kilobit  option  is  allowed* 
To  take  advantage  of  this  option  the  computer  concentrates  the 
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traffic  by  transmitting  the  flow  generated  in  the  Los  Angeles 
area  to  the  San  Francisco  area  where  all  cross  country  traffic 
is  transmitted  over  high  rate  lines  to  the  East  Coast.  A  similar 
pattern  occurs  on  the  East-to-West  Coast  Traffic. 
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CENTRALIZED  NETWORKS 

Many  computer  networks  consist  of  a  set  of  remote  sites 
connected  to  a  central  node.  For  example ,  most  time  sharing  systems, 
computer  reservation  systems,  accounting  systems,  etc.,  are  of  this 
type.  In  addition,  many  networks  are  hierarchal.  That  is,  they  are 
interconnections  of  centralized  networks.  The  AR?A  network  seen*..  tc 
be  evolving  in  this  direction.  At  each  node  in  the  network  a  number 
of  computers  and  terminals  may  eventually  be  connected.  In  addition, 
such  systems  must  be  considered  in  studying  the  economics  of  large 
computer  networks.  Major  problems  which  arise  in  designing  these 
systems  are  the  layout  and  sizing  of  the  connections  between  nodes. 

In  order  to  solve  these  problems,  the  network  designer  is  again  faced 
with  a  discrete  design  problem  which  is  intractable  using  existing 
integer  programming  m.ihods  for  problems  of  practical  size. 

The  objective  j 2  to  select  linx  locations  and  capacities  so 
that  the  avert  Ime  delay  required  to  transmit  a  standard  size 
message  from  any  .i*xie  to  the  central  node  does  not  exceed  a  specified 
number.  This  maximum  allowable  average  delay  time  may,  in  some 
cases,  vary  from  node  to  node.  The  design  problem  is  then  to  find 
the  least  cost  system  which  satisfies  the  time  delay  constraints 
for  specified  level*,  of  traffic  between  nodes. 

A  strong  case  can  be  made  for  designing  “tree"  like 
centralized  computer  networks.  That  is,  the  nodes  are  connected 


45 


by  the  minimum  number  of  possible  links  and  there  is  exactly  one 
transmission  path  between  any  pair  of  nodes.  Although  it  is 
possible  to  construct  situations  in  which  trees  are  not  optimal, 
they  represent  a  reasonable  class  of  networks  for  the  layout 
problem.  However ,  even  if  one  reduces  his  range  of  designs  to 
trees,  the  globally  optimal  network  is  usually  impossible  to  find. 

We  now  consider  the  optimal  design  of  centralized 
computer  networks.  This  study  is  a  first  step  in  our  study  of 
growth  properties  of  store-and- forward  networks.  We  describe  a 
method  to  select  globally  least  cost  link  capacities  for  a  specified 
tree  structure  when  maximum  average  delay  times  are  specified  for 
each  node.  We  also  give  a  heuristic  method  for  finding  low  cost 
tree  structures.  These  methods  will  be  used  to  study  the  cost-time- 
delay-  throughput  characteristics  of  large  hierarchal  networks.  The 
methods,  which  grew  out  of  design  studies  for  natural  gas  and  irri¬ 
gation  systems,  have  been  programmed  and  are  capable  of  handling 
networks  with  several  thousand  nodes.  In  addition,  they  allow  an 
arbitrary  set  of  link  capacities  and  cost  structure  and  do  not 
depend  on  the  mathematical  model  used  to  calculate  average  time 
delay. 

We  consider  the  following  network  model. 


(1)  The  network  topology  is  a  tree.  The  network  contains  N 
nodes  numbered  from  1  to  N.  A  link  between  nodes  i  and  j  is  denoted 
by  the  unordered  pair  (i,j).  Node  1  is  the  central  computer. 

(2)  Each  link  may  be  assigned  one  of  a  finite  number  of 

capacities  C  ,  C  ,  ...  ,  C  .  The  capacity  of  link  (j,k)  is  denoted 
1  <•  x 

by  c(j,k)  and  for  simplicity,  each  link  is  assumed  to  be  fully 
duplex  (not  a  necessary  assumption) . 

(3)  The  cost  of  assigning  capacity  C  to  link  (j,k)  is  an 
arbitrary  function  of  various  parameters  such  as  distance,  capacity, 
error  rate,  data  set,  and  so  on. 

(4)  The  average  time  delay  t(j,k)  required  to  transmit  b.  bits 

per  second  from  node  j  to  node  k  over  link  (j,k)  can  be  expressed  as 

t(j,k)  =  T (c ( j,k) ,  V>  . .  )  =  T  (c(j,k),  _  )  where  _  represents 

all  parameters  other  than  c(j,k).  The  only  property  that  we  impose 

upon  the  function  T(*)  is  the  physically  motivated  one  that 

T (C  ,  _  )  >  T(C  ,  _  )  if  C  >  C  .  For  example,  we  may  use  the 

j  J  i 

time  delay  equations  given  earlier. 

(5)  Nodal  time  delays  are  insignificant.  (This  restriction 
can  be  removed  but  a  complete  treatment  is  lengthy) . 

(6)  A  traffix  matrix  R  =  [r.  .]  is  specified  where  r  is  the 

“  i,j 

n 'vtT.be r  of  bits  per  second  from  node  i  to  node  j.  All  traffic  from 
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node  i  to  node  j  (  i  ?  1,  j  /  1  )  must  be  routed  through  node  1. 
Thus,  the  network  traffic  can  be  described  by  two  vectors 


In  a  time  sharing  system  with  only  one  main  computer,  r  may  equal 

i#  j 

zero  when  neither  i  nor  j  represents  the  central  node  1.  On  the 
other  hand,  all  of  the  off-diagonal  entries  in  R  may  be  non-zero 
for  a  computer-communication  network. 
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Consider  a  fixed  topological  structure  G  such  as  the  one  shown 

in  Figure  15  .  Capacities  are  to  be  assigned  with  minimal  cost  to 

this  tree  so  that  the  maximum  average  time  delay  for  transmission 

from  any  node  to  node  1  does  not  exceed  t  .  A  tree  has  the 

max 

property  that  there  is  exactly  one  path  between  each  pair  of  nodes. 

Consequently,  given  G,  R  ,  and  R  ,  the  flows  in  each  network  link 

“  1  "“2 

are  uniquely  determined.  Let  d^,  the  “degree'1  of  node  i,  be  the 

number  of  links  incident  at  i.  A  node  j  is  said  to  be  "pendant"  if 

d  =  1.  Let  t  be  the  average  time  required  to  send  a  message  from 
j  i 

node  i  to  node  1.  With  these  definitions,  it  is  easy  to  see  that 

Max  t  4$  t  if  and  only  if  Max  t,  ^  t  .  That  is,  in  order  to 
i  max  \td.*\  1  max 

guarantee  that  t  is  not  exceeded  for  transmission  from  any  nod^ 

max 

in  the  network,  we  need  only  guarantee  that  this  is  true  for  pendant 
nodes.  This  property  is  used  crucially  in  the  algorithm  to  follow 
since  limitations  on  network  performance  can  be  determined  by  con¬ 
sidering  only  pendant  nodes. 

The  first  problem  to  be  considered  is  given  and  ,  find  the 
least  cost  set  oi  link  capacities  so  that  the  maximum  time  delay  re¬ 
quired  to  transmit  a  message  from  any  node  to  node  1  does  not  exceed 

a  specified  constant  t  .  Choosing  capacities  for  some  of  the  links 

max 


CENTRALIZED  NETWORK 


FIGURE  15 
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and  leaving  the  remaining  capacities  unspecified  will  be  called  a 
partial  assignment.  Methods  will  now  be  given  to  recognize  partial 
assignments  which  cannot  be  in  the  optimal  assignment.  These 
partial  assignments  are  discarded  without  discarding  any  partial 
assignment  which  might  be  in  an  optimal  assignment. 

Associate  with  each  link  two  lists,  called  COST  and 
th 

DELAY.  The  i  component  of  COST  is  the  cost  associated  with  the 

th  th 

i  smallest  capacity  choice.  The  i  component  of  DELAY  is  the 

th 

time  delay  for  that  link  arising  from  a  choice  of  the  i  smallest 
capacity.  The  values  of  the  elements  of  COST  are  in  increasing 
order  and  those  of  DELAY  in  decreasing  order.  The  two  lists  taken 
together  will  be  called  a  link  array.  Two  techniques  will  now  be 
given  which,  when  ^sed  together  or.  a  given  tree,  can  efficiently 
process  these  lists  to  obtain  the  optimal  capacity  assignment. 

These  techniques  were  discovered  by  D.  Kleitman  and  were  first 
applied  to  the  design  of  offshore  natural  gas  pipeline  networks. 
Variations  and  extensions  of  the  algorithm  have  since  been  developed 
by  the  authors  and  applied  to  the  design  of  Cable  Television  Systems  a 
large  scale  irrigation  systems.  The  following  discussion  is  adapted 
from  reference  5. 

The  first  technique  is  called  the  parallel  merge.  The 
prc>-  dure  can  be  used  on  any  set  of  links  which  directly  connect 

pendant  nodes  to  a  common  node.  We  will  use  links  (11,  10),  (12,  10) 
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and  (13,  10)  from  the  tree  of  Figure  15  to  illustrate  the  procedure. 
Suppose  that  there  are  seven  possible  capacities  for  each  link  and 
that  the  arrays  for  these  links  are  as  follows: 


DELAY: 

(120, 

111,  92,  66,  54,  40,  31) 

(11,10)  array: 

COST: 

(13, 

17,  23,  29,  36,  45,  58) 

DELAY:  (150,  139,  118,  87,  75,  70,  67) 

(12,  10)  array: 

COST:  (6,  9,  14,  21,  30,  40,  56) 


DELAY:  (94,  86,  80,  61,  55,  48,  32) 

(13,  10)  array: 

COST:  (8,  12,  18,  26,  34,  43,  57) 

where  the  delays  are  in  miliseconds  and  costs  in  hundreds  of  dollars. 
A  testing  block  is  set  up  as  follows: 


DELAY 

COST 


INDEX 


(11,10)  (12,10) _ (13,10) 


\  *■  /  «*• v  / 

120  ms 

© 

-  - - 1 

•  i 

94 

13 

6 

i 

8 

1 

1 

1 

l 

Each  link  is  assigned  a  column  in  the  testing  block  as  indicated. 

If  the  index  in  a  column  is  set  to  i,  then  the  DELAY  and  COST 

th 

entries  in  that  column  will  be  the  i  components  of  the  list. 
Initially,  the  indices  are  set  to  1  and  the  testing  block  is  as 

sr.cvn  aoove  • 
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The  procedure  locates  the  largest  entry  in  the  DELAY  row  of 
the  testing  block.  In  our  example,  this  occurs  in  the  column  2  and 
the  entry  is  shown  circled.  If  the  smallest  capacity  is  chosen  for 
(12,  10),  the  delays  at  nodes  11  and  13  can  never  exceed  t 
Thus,  choosing  other  than  the  minimum  capacities  for  these  links 
when  (12,  10)  has  the  minimum  capacity  will  increase  the  total  cost 
of  the  links  but  cannot  reduce  the  maximum  time  delay. 

We  now  enter  the  circled  DELAY  entry  and  the  sum  of  the  COST 
entries  of  the  testing  block  in  a  new  array.  This  entry  on  the  new 
array  corresponds  to  the  partial  assignment  of  minimum  capacities  to 
(11,  10),  (12,  10)  and  (13,-  10)  and  is  shown  below. 

DELAY:  (150) 

COST:  (27) 

Since  no  better  choice  of  capacities  for  (11,  10)  and  (13,  10)  is 
possible  with  (12,  10)  at  this  capacity,  we  increase  the  index  in 
the  secc  -d  column  of  the  testing  blocks  which  yields 


(11,10) 

(12,10) 

(13,10) 

DELAY 

120 

94 

COST 

13  i 

9 

1 

8 

INDEX 

1 

2 

i 

i 

|  j 

testing  blc 


In  the  updated  testing  block,  the  new  maximum  DELAY  entry 


is  still  in  the  second  column.  This  means  that  if  (12,10)  has  the 
second  smallest  capacity,  it  still  will  not  pay  to  have  (11,10)  or 
(13,10)  at  any  capacities  other  than  the  smallest.  We  make  a  second 
entry  in  the  new  list  as  before  to  give 

DELAY:  (150,  139) 

COST:  (27,  30) 


This  new  entry  represents  a  partial  assignment  of  the  second 


smallest 


capacity  to  (12,10)  and  the  smallest  capacity  to  (11,10)  and  (13,10) 


The  process  terminates  when  the  largest  entry  of  the  DELAY 
row  of  the  testing  block  occurs  in  a  column  whose  index  has  been 
promoted  to  its  maximum  value,  7.  Further  promotion  of  the  other 
indices  would  correspond  to  partial  assignments  of  greater  cost  and 
no  possible  savings  in  maximum  time  delay. 


Each  entry  in  the  final  new  array  (see  below)  represents 

an  assignment  of  capacities  to  the  links  (11,10),  (12,10)  and 

(13,10)  Furthermore,  no  other  partial  assignments  for  these  links 

need  be  considered.  Note  that  the  number  of  possible  partial  assign 

3 

ments  for  these  three  links  is  7  343.  However,  the  parallel 

merge  techniques  will  produce  an  array  with  at  most  19  columns,  one 
from  the  original  testing  block  and  one  each  from  the  testing  blocks 
resulting  *om  a  maximum  of  18  index  promotions.  The  minimum 
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number  of  columns  in  a  new  list  is  7. 

DELAY  (150,  139,  120,  138,  111,  94,  92,  87,  86,  80,  75,  70,  67) 
COST  (27.  30,  35,  39,  46,  52,  56,  62,  71,  77,  85,  95,  111) 

The  parallel  merge  produces  an  array  whose  entries  corres- 
pond  to  partial  assignments  which  are  candidates  for  inclusion  in 
the  optimal  assignment.  The  new  array  can  be  viewed  as  the  DELAY 
and  COST  lists  of  an  equivalent  link  which  replaces  those  links 
whose  arrays  were  merged.  This  equivalent  link  can  be  thought  of  as 
a  link  connected  between  node  10  and  a  node  cons  sting  of  a  combi¬ 
nation  of  nodes  11,  12,  and  13.  (Hence,  the  name  parallel  merge.) 
Note  that  the  components  of  the  equivalent  COST  and  DELAY  lists  are 
respectively  in  increasing  and  decreasing  order  so  that  no  re¬ 
ordering  is  required. 

It  now  becomes  desirable  to  combine  the  array  of  the 
equivalent  link  with  the  array  of  (10,9)  to  create  a  new  equivalent 
array  for  (11,10),  (12,10),  (13,10)  and  (10,9).  Again  we  wish  to 

retain  as  few  partial  assignments  as  possible  without  eliminating 
any  partial  assignments  which  can  possibly  be  in  the  optimal  assign¬ 
ment.  A  technique  for  accomplishing  this,  called  the  serial  merge. 


is  described  next 
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The  serial  merge  can  be  used  on  any  two  links  incident  to 
a  common  node  of  degree  two  if  at  least  one  of  the  two  links  is  also 
incident  to  a  pendant  node.  We  use  the  equivalent  array  obtained 
above  and  the  following  (10,9)  array  to  illustrate  the  serial 


lerge: 


(10,9)  array: 


DELAY 

(133,  124,  104, 

78, 

65, 

51,  42) 

COST 

(6,  10,  15,  23, 

33, 

43, 

5S) 

We  set  up  a  testing  block  with  7  columns  as  follows; 


1 

2 

3 

4 

5 

6 

7 

DELAY 

! 

283 

274 

254 

228 

215 

201 

192 

COST 

33 

37 

42 

50 

60 

70 

86 

INDEX 

1 

1 

1 

1 

1 

1 

1 

1 

1 

th  th 

Tne  i  column  corresponds  to  the  i  smallest  capacity  choice  for 

th 

(10,  9)  and  an  index  equal  to  j  in  a  column  corresponds  to  the  j 
partial  assignment  of  (11,10),  (12,10),  and  (13,10)  in  their 
equivalent  array.  The  DELAY  and  COST  entries  in  a  column  are  the 
corresponding  maximum  delay  and  the  sum  of  link  costs  that  would 
result  from  such  a  partial  assignment  of  the  four  links. 


Initially,  the  indices  are  all  set  to  1.  The  testing 
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block  given  above  therefore  gives  all  the  data  for  the  partial 
assignments  for  every  choice  of  capacity  for  (10,9)  with  the  partial 
assignment  of  (11,10),  (12,10)  and  (13,10)  corresponding  to  the 

first  component  of  the  equivalent  array.  Thus,  the  DELAY  entry  in 

th  th 

the  i  column,  ox’  the  initial  testing  block  is  the  sum  of  the  i 

DELAY  component  for  (10,9)  and  the  first  DELAY  component  on  the 

th 

equivalent  branch  list.  Similarly  the  i  column  COST  entry  is  the 
sum  of  the  i  (10,9)  COST  component  and  the  first  equivalent  link 
COST  component. 

We  now  locate  the  maximum  entry  in  the  DELAY  row  of  the 
testing  block.  Initially,  this  will  always  occur  in  the  first 
column.  The  DELAY  and  COST  entries  of  this  column  become  candidate 
components  in  another  equivalent  link  array.  We  then  increase  the 
index  in  the  first  column  to  yield 


1 

2 

3 

4 

5 

6 

7 

DELAY 

272 

274 

254 

228 

215 

201 

i 

192 

COST 

36 

37 

42 

50 

60 

70 

— 

00 

INDEX 

2 

1 

1 

1 

1 

*  i 

i 

The  largest  DELAY  entry  is  now  in  the  second  column.  The  DELAY  and 
COST  entries  in  this  column  become  the  second  component  in  the  new 
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array.  The  updating  of  the  testing  block  yields 


1  2  3  4  5  _  6  7 


272 

263 

25< 

228 

215 

201 

192 

36 

40 

42 

50 

60 

70 

86 

j 

_ -J 

2 

1 

1 

1 

1 

1 

The  new  array  becomes 

DELAY:  (283,  274,  272) 

COST:  (33,  37,  36) 

Each  column  in  the  new  array  corresponds  to  a  partial  assignment  of 
(11,10),  (12,10),  (13,10)  and  (10,9).  The  DELAY  row  has  its 
components  in  non-increasing  order,  but  the  last  COST  component  in 
the  array  is  smaller  than  the  second  COST  component.  The  partial 
assignment  corresponding  to  the  third  column  is  a lways  preferable 
to  the  partial  assignment  corresponding  to  the  second  column,  since 
the  former  has  a  lower  link  cost  and  cannot  result  in  a  greater 
maximum  time  delay.  We  therefore  can  eliminate  the  second  column 
from  further  consideration.  Our  array  therefore  reduces  to 

DELAY:  (283,  272) 


COST: 


(33,  36) 
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In  ^neral,  when  a  new  column  is  added  to  the  array,  we 
eliminate  all  columns  already  on  the  list  with  COST  components 
which  arc  not  smaller  than  the  latest  entry.  Since  after  each 
change  the  COST  vector  components  in  the  array  will  be  in  increasing 
order,  the  updating  of  the  array  is  easy  tc  implement. 

As  we  proceed  with  the  serial  merge,  each  of  the  13  columns 
of  the  equivalent  link  array  will  form  a  candidate  with  each  of  the 
7  (10,9)  array  components.  Thus,  a  total  of  7(13)  -  91  candidates 
must  be  processed.  However,  as  we  have  seen,  some  of  these  candidates 
can  be  eliminated.  For  the  example  under  consideration,  only  31  of 
the  91  candidates  are  retained  and  these  constitute  an  equivalent 
link  array  for  (11,10),  (12,10),  (13,10)  and  (10,9).  In  general 
only  a  small  fraction  of  the  candidates  in  a  serial  merge  will  be 
retained.  A  greater  percentage  of  the  earlier  and  later  candidates 
will  generally  be  retained  than  those  in  the  middle,  so  the  power  of 
the  elimination  procedure  is  not  fully  illustrated  by  the  small 
example  given. 

Since  the  parallel  and  serial  merge  techniques  can  be  applied 
to  lists  of  both  actual  and  equivalent  link's,  the  entire  tree  can 
be  processed  to  yield  a  single  equivalent  link  array.  The  capacity 
assignment  in  this  array  with  the  smallest  cost  is  the  optimal 
assignment  for  the  entire  tree. 
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The  size  of  the  intermediate  and  final  lists  produced 
by  the  parallel  and  serial  merge  techniques  are  of  great 
importance.  The  maximum  list  size  appears  to  grow  approximately 
linearly  as  a  function  of  the  number  jf  nodes#  where  the  number 
of  possible  assignments  grows  exponentially  as  a  function  of  the 
number  of  nodes.  It  takes  a  fraction  of  a  second  of  computer 
time  on  a  CDC  6600  to  optimize  a  25  node  tree.  In  addition# 
problems  with  several  hundred  nodes  have  been  run  in  a  few 
seconds.  It  appears  that  with  careful  programming  and  the  appli¬ 
cation  of  a  number  of  “short  cuts",  networks  with  as  many  as 
10,000  nodes  can  be  handled  within  a  few  minutes  of  computer 
time. 

The  above  paragraphs  describe  an  optimal  method  to 
efficiently  select  link  capacities  for  a  specified  tree.  We  now 
give  a  heuristic  method  for  finding  low  cost  configurations.  In 
combination  with  Kleitman's  capacity  assignment  algorithm#  this 
method  appears  to  produce  optimal  or  near  optimal  results  in  all 
cases . 


As  before#  we  will  say  that  a  “feasible"  network  is  one 
which  satisfies  all  of  the  network  constraints  and  an  “optimal" 
network  is  a  feasible  network  with  the  least  possible  cost.  The 
design  method  uses  a  starting  routine  to  generate  a  feasible 


starting  network  and  an  optimizing  routine  to  examine  networks 
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derived  from  the  starting  neework  by  means  of  a  “local"  change  in 
network  topology.  If  a  feasible  network  with  lower  cost  is  found, 
it  is  adopted  as  a  new  starting  network  and  the  process  repeated. 
Eventually,  a  locally  optimal  feasible  network  is  reached,  and  the 
entire  procedure  is  repeated  with  a  different  starting  network. 

Trie  starting  network  may  be  presented  as  an  input  or  generated  by 
the  computer. 


For  the  problem  under  consideration,  an  effective  local 
transformation  is  a  special  kind  of  elementary  tree  transformation 
[7] .  It  can  be  shown  that  any  tree  can  be  obtained  from  any  other 
tree  by  a  sequence  of  elementary  tree  transformations.  The  ele¬ 
mentary  transformation  used  is  as  follows:  For  a  given  tree, 
choose  a  node  i  and  find  the  node  i^  closest  to  i  but  not  already 
connected  to  i.  Add  link  (i^,i)  to  the  tree  and  identify  the  cir¬ 
cuit  formed.  Suppose  that  this  circuit  consists  of  links  (i  , i ) , 
(i,i2),  (i2#i3)  ,  ...  ,  (ij,i^)«  New  trees  are  formed  by  deleting 

in  turn  links  (i,i2),  (i2#i3)  ,  ...  ,  and  finally  {i ^ , i .  Each 

time  a  link  is  deleted,  Kleitman’s  algorithm  is  applied  to  deter¬ 
mine  optimal  link  capacities  and  network  cost.  As  each  node  i  is 
scanned  in  turn?  link  additions  from  i  to  its  d  nearest  neighbors 
are  considered.  Whenever  L  lower  cost  trees  have  been  generated, 
the  node  scan  is  begun  again.  (For  offshore  pipeline  design  [5] , 
d-3  and  Lr^l)  .  This  procedure  has  proven  to  be  extremely  powerful 


Cl 


in  finding  low  cos.  trees,  Whenever  the  problem  has  been  small 
enough  to  exhaustively  find  optimal  trees,  the  method  has  con¬ 
verged  to  the  optimum  solution.  In  only  one  case  has  a  situation 
been  constructed  in  which  the  local  transformation  could  not 
produce  the  known  lowest  cost  network. 
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Thrs  report  discussed  the  analysis  and  optimization  of  the 
ARPA  Computer  Network.  The  general  design  philosophy  followed 
as  well  as  the  specific  elements  considered  in  the  implementation 
of  this  philosophy  are  described.  Relationships  between  traffic 
level,  link  capacities,  and  cost  as  a  function  of  the  number  of 
nodes  in  the  networks  have  been  investigated.  Extensive  studies 
have  been  made  for  twelve,  sixteen,  eighteen,  and  twenty  node 
networks  where  each  node  was  a  potential  site  for  the  ARPA  Net¬ 
work.  Results  of  these  studies  are  summarized.  Methods  for 
optimizing  the  design  of  centralized  network^  have  been  dis¬ 
covered.  These  methods,  which  are  described  here,  are  presently 
being  used  to  design  large  decentralized  networks. 
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